---
title: Sentence Transformer
description: 'Local reranking with HuggingFace cross-encoder models'
---

Sentence Transformer reranker provides local reranking using HuggingFace cross-encoder models, perfect for privacy-focused deployments where you want to keep data on-premises.

## Models

Any HuggingFace cross-encoder model can be used. Popular choices include:

- **`cross-encoder/ms-marco-MiniLM-L-6-v2`**: Default, good balance of speed and accuracy
- **`cross-encoder/ms-marco-TinyBERT-L-2-v2`**: Fastest, smaller model size
- **`cross-encoder/ms-marco-electra-base`**: Higher accuracy, larger model
- **`cross-encoder/stsb-distilroberta-base`**: Good for semantic similarity tasks

## Installation

```bash
pip install sentence-transformers
```

## Configuration

```python Python
from mem0 import Memory

config = {
    "vector_store": {
        "provider": "chroma",
        "config": {
            "collection_name": "my_memories",
            "path": "./chroma_db"
        }
    },
    "llm": {
        "provider": "openai",
        "config": {
            "model": "gpt-4o-mini"
        }
    },
    "rerank": {
        "provider": "sentence_transformer",
        "config": {
            "model": "cross-encoder/ms-marco-MiniLM-L-6-v2",
            "device": "cpu",  # or "cuda" for GPU
            "batch_size": 32,
            "show_progress_bar": False,
            "top_k": 5
        }
    }
}

memory = Memory.from_config(config)
```

## GPU Acceleration

For better performance, use GPU acceleration:

```python Python
config = {
    "rerank": {
        "provider": "sentence_transformer",
        "config": {
            "model": "cross-encoder/ms-marco-MiniLM-L-6-v2",
            "device": "cuda",  # Use GPU
            "batch_size": 64   # high batch size for high memory GPUs
        }
    }
}
```

## Usage Example

```python Python
from mem0 import Memory

# Initialize memory with local reranker
config = {
    "vector_store": {"provider": "chroma"},
    "llm": {"provider": "openai", "config": {"model": "gpt-4o-mini"}},
    "rerank": {
        "provider": "sentence_transformer",
        "config": {
            "model": "cross-encoder/ms-marco-MiniLM-L-6-v2",
            "device": "cpu"
        }
    }
}

memory = Memory.from_config(config)

# Add memories
messages = [
    {"role": "user", "content": "I love reading science fiction novels"},
    {"role": "user", "content": "My favorite author is Isaac Asimov"},
    {"role": "user", "content": "I also enjoy watching sci-fi movies"}
]

memory.add(messages, user_id="charlie")

# Search with local reranking
results = memory.search("What books does the user like?", user_id="charlie")

for result in results['results']:
    print(f"Memory: {result['memory']}")
    print(f"Vector Score: {result['score']:.3f}")
    print(f"Rerank Score: {result['rerank_score']:.3f}")
    print()
```

## Custom Models

You can use any HuggingFace cross-encoder model:

```python Python
# Using a different model
config = {
    "rerank": {
        "provider": "sentence_transformer", 
        "config": {
            "model": "cross-encoder/stsb-distilroberta-base",
            "device": "cpu"
        }
    }
}
```

## Configuration Parameters

| Parameter | Description | Type | Default |
|-----------|-------------|------|---------|
| `model` | HuggingFace cross-encoder model name | `str` | `"cross-encoder/ms-marco-MiniLM-L-6-v2"` |
| `device` | Device to run model on (`cpu`, `cuda`, etc.) | `str` | `None` |
| `batch_size` | Batch size for processing documents | `int` | `32` |
| `show_progress_bar` | Show progress bar during processing | `bool` | `False` |
| `top_k` | Maximum documents to return | `int` | `None` |

## Advantages

- **Privacy**: Complete local processing, no external API calls
- **Cost**: No per-token charges after initial model download
- **Customization**: Use any HuggingFace cross-encoder model
- **Offline**: Works without internet connection after model download

## Performance Considerations

- **First Run**: Model download may take time initially
- **Memory Usage**: Models require GPU/CPU memory
- **Batch Size**: Optimize batch size based on available memory
- **Device**: GPU acceleration significantly improves speed

## Best Practices

1. **Model Selection**: Choose model based on accuracy vs speed requirements
2. **Device Management**: Use GPU when available for better performance
3. **Batch Processing**: Process multiple documents together for efficiency
4. **Memory Monitoring**: Monitor system memory usage with larger models